ssDNA accessibility of Rad51 is regulated by orchestrating multiple RPA dynamics

The eukaryotic single-stranded DNA (ssDNA)-binding protein Replication Protein A (RPA) plays a crucial role in various DNA metabolic pathways, including DNA replication and repair, by dynamically associating with ssDNA. While the binding of a single RPA molecule to ssDNA has been thoroughly studied, the accessibility of ssDNA is largely governed by the bimolecular behavior of RPA, the biophysical nature of which remains unclear. In this study, we develop a three-step low-complexity ssDNA Curtains method, which, when combined with biochemical assays and a Markov chain model in non-equilibrium physics, allow us to decipher the dynamics of multiple RPA binding to long ssDNA. Interestingly, our results suggest that Rad52, the mediator protein, can modulate the ssDNA accessibility of Rad51, which is nucleated on RPA coated ssDNA through dynamic ssDNA exposure between neighboring RPA molecules. We find that this process is controlled by the shifting between the protection mode and action mode of RPA ssDNA binding, where tighter RPA spacing and lower ssDNA accessibility are favored under RPA protection mode, which can be facilitated by the Rfa2 WH domain and inhibited by Rad52 RPA interaction.

To elucidate the biophysical mechanism of RPA dynamic binding to long ssDNA, we began by establishing a continuous-time discrete Markov chain model ( Fig. 4a-b).
Random sequential adsorption (RSA) models were initially proposed by Paul Flory 1 to describe irreversible adsorption processes of large molecules to a liquid-solid interface. If the interface is chosen to be one-dimensional (1D) and a finite length, the so-called 1D RSA model will provide an ideal basis for further modelling the processes of protein binding to DNA. Because this model precisely captures a key property, in which each nucleotide (nt) of ssDNA cannot be occupied by more than two protein molecules. This property differentiates DNA-relevant reactions from elementary chemical reactions described by the mass action law. And more importantly, it leads to incomplete occupation even if the proteins are oversaturated. Relevant theoretical consequences of applying RSA-like model to DNA-protein reaction kinetics are also known as the McGhee-von Hippel model [2][3][4] . This type of model has been applied to study similar biophysics settings 5 , where a standard mean field approximation is employed to obtain accurate parameters of the kinetic parameters. In order to reveal finer structures such as the gap distribution, we implemented an exact stochastic sampling approach to this model. In our work of RPA, we adapted the model according to our current knowledge of RPA binding modes to obtain full elements simulation results in which multiple binding modes and volume exclusion effects are all taken into consideration. Based on the previous references 6-8 , we assumed that RPA has two binding modes, 20-nt mode and 30-nt mode, representing partial binding mode (PBM) and full-length binding mode (FLBM) (Fig. 4a). RPA initially binds to a 20-nt ssDNA with a rate of ! (unit s -1 , the concentration of DNA and RPA incorporated), and dissociates to ssDNA with a rate of "! (unit s -1 ). This is the 20-nt mode. ! is assumed to be constant in the model because ssDNA Curtains maintain a laminar flow of RPA, which allows the concentration of free RPA to be constant. One possible scenario for the 20-nt mode is the DBD-A, DBD-B, and DBD-C together binding to ssDNA 7 . The DBD-D subsequent binding leads to the 30-nt mode (Fig. 4a). When an RPA molecule is in the 20-nt mode, DBD-D starts to bind to an extra 10-nt ssDNA with a rate of # (unit s -1 ), and dissociates to ssDNA with a rate of "# (unit s -1 ). This is the 30-nt mode (FLBM). The recent reference 7,9 suggested that the 30-nt mode is related to two different conformations, which are the extended conformation or the U-shape conformation (Fig.   4a). Here, we need to highlight the polarity of RPA binding to ssDNA. RPA always aligns along DNA in the same direction 7 .
Due to the existence of multiple binding modes, RPA molecules exhibit interesting kinetic features such as facilitated exchange and desorption. With the advancement of computational power, we can now use the novel dynamic language Julia 10 to carry out exact stochastic simulations that allow for all the possible behaviors of RPA-ssDNA interactions that are known to us. The relevant codes are available through link (https://github.com/hsianktin/RPA_model).
In this model (Fig. 4b), the state of ssDNA fragment is represented by a vector of length , taking values in 0,1, where 0 represents that this nt is not occupied and 1 represents the nt being occupied. Each RPA must take up = 20 nts for initial binding with DNA. And if the local state of DNA permits, it can further occupy another Δ = 10 nts in the 3' direction. We assume that each consecutive unoccupied segment of length recruits one RPA at the rate of ! (unit s -1 , the concentration of RPA incorporated), and that the total binding rate of RPA standing for the number of consecutive unoccupied segments of length on the ssDNA. For further occupying another 10 nts, we assign a rate parameter # and calculate the overall rate by where ( represents the leftmost position of the th bound RPA in the 20-nt mode.
In terms of the unbinding pathway, we assume that the 30-nt mode must be reopened into the 20-nt mode before detaching from DNA, and the rate is assumed to be "# . And the desorption rate for the 20-nt mode is denoted as "! . The overall rates "# and "! are calculated according to the following formula: let ( iterate over all 30nt mode RPAs and let ( iterate over all 20-nt mode RPAs as before, For each moment, the total possible reaction rate ,-, (s -1 ) is: Summation is taken over all possible reactions under the specific configuration. All the underlying reactions occur stochastically according to their exponentially distributed waiting times whose distribution parameter being corresponding reaction rates. In other words, the waiting time between two adjacent reactions (s) follows an exponential distribution, which is a probability density function (pdf): And the mean value of (s) is: After each reaction happens, the state of DNA is changed, and the number of possible reactions should be evaluated again based on the current state of DNA.
Gillespie algorithm 11 is applied to the system for sampling trajectories from this stochastic model.
In experiments, the contour length of each ssDNA is unknown and varied.
Simulation samples the ratios of DNA covered by 20-nt mode RPA ( #$ ) and 30-nt mode RPA ( /$ ) at given time points. The ratio of naked DNA is then 0 = 1 − #$ − /$ . In order to compare simulation outcomes with experimental data, we introduce additional parameters and , such that the relative extension of ssDNA is calculated as: Then, we scale the extension of ssDNA at the 30-min time point to unit 1.
Now, ( ) is comparable with the experimental results. We compared the weighted squared differences between the mean extension-time curves of experimental data and simulation outcomes. The parameters used for simulation are determined through sampling a wide range of plausible parameters. The parameter with the least squared error is chosen (Supplementary Fig. 5).
Due to its significant computational cost, we adopted the following optimization strategy. We optimize the log10 of corresponding parameters. Within the experimental range of 40-min, and = 5,000-nt, kinetic parameters should be at least 1 × 10 "2 s -1 .
The total length L of DNA is set to be 5,000 nts to reduce computational costs while maintaining good approximation to realistic scenarios. We adopted a discrete gradient descent method to optimize the parameters. Let The choice of this weight leverages the tractability of experimental data to obtain the intuitively best fitting of the experimental observations.
The loss function we chose does not admit a probabilistic interpretation. Therefore, there is no precise confidence interval defined for each parameter. We plot the minimum weighted loss associated with every type of parameter ( Supplementary Fig.   5), to determine whether the minimum is reached or whether the range of parameters are appropriate. If the best parameter under current precision is determined, we optimize the parameter by refining the range of parameters and repeating the process.